Sensory-Aware Multimodal Fusion for Word Semantic Similarity Estimation

نویسندگان

Georgios Paraskevopoulos

Giannis Karamanolakis

Elias Iosif

Aggelos Pikrakis

Alexandros Potamianos

چکیده

Traditional semantic models are disembodied from the human perception and action. In this work, we attempt to address this problem by grounding semantic representations of words to the acoustic and visual modalities. Specifically we estimate multimodal word representations via the fusion of auditory and visual modalities with the text modality. We employ middle and late fusion of representations with modality weights assigned to each of the unimodal representations. We also propose a fusion method that assigns different weights to each word, based on how relevant that word is for the audio and visual modalities. The proposed methods are evaluated for the task of semantic similarity computation between words. To our knowledge, this is the first work that combines text, audio and visual features for the computation of multimodal semantic word representations. Multimodal models outperform the unimodal models, indicating the importance of multimodal fusion and perceptual grounding.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity

Objective: Brain trauma evidences suggest that the two grammatical categories of noun and verb are processed in different regions of the brain due to differences in the complexity of grammatical and semantic information processing. Studies have shown that the verbs belonging to different semantic categories lead to neural activity in different areas of the brain, and action verb processing is r...

متن کامل

Multimodal Word Distributions

Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic i...

متن کامل

Network-Based Distributional Semantic Models

In this thesis, the unsupervised creation of language-agnostic Distributional Semantic Models (DSMs) using web harvested data is investigated for the problem of semantic similarity estimation. Semantic similarity can be regarded as the building block for numerous tasks of Natural Language Processing, e.g., affective text analysis and paraphrasing. The first part of the thesis deals with the con...

متن کامل

A comprehensive model of spoken word recognition must be multimodal: Evidence from studies of language mediated visual attention

When processing language, the cognitive system has access to information from a range of modalities (e.g. auditory, visual) to support language processing. Language mediated visual attention studies have shown sensitivity of the listener to phonological, visual, and semantic similarity when processing a word. In a computational model of language mediated visual attention, that models spoken wor...

متن کامل

Audio-Based Distributional Representations of Meaning Using a Fusion of Feature Encodings

Recently a “Bag-of-Audio-Words” approach was proposed [1] for the combination of lexical features with audio clips in a multimodal semantic representation, i.e., an Audio Distributional Semantic Model (ADSM). An important step towards the creation of ADSMs is the estimation of the semantic distance between clips in the acoustic space, which is especially challenging given the diversity of audio...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Sensory-Aware Multimodal Fusion for Word Semantic Similarity Estimation

نویسندگان

چکیده

منابع مشابه

Developing a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity

Multimodal Word Distributions

Network-Based Distributional Semantic Models

A comprehensive model of spoken word recognition must be multimodal: Evidence from studies of language mediated visual attention

Audio-Based Distributional Representations of Meaning Using a Fusion of Feature Encodings

عنوان ژورنال:

اشتراک گذاری